27 research outputs found

    Testing bounded arboricity

    Full text link
    In this paper we consider the problem of testing whether a graph has bounded arboricity. The family of graphs with bounded arboricity includes, among others, bounded-degree graphs, all minor-closed graph classes (e.g. planar graphs, graphs with bounded treewidth) and randomly generated preferential attachment graphs. Graphs with bounded arboricity have been studied extensively in the past, in particular since for many problems they allow for much more efficient algorithms and/or better approximation ratios. We present a tolerant tester in the sparse-graphs model. The sparse-graphs model allows access to degree queries and neighbor queries, and the distance is defined with respect to the actual number of edges. More specifically, our algorithm distinguishes between graphs that are ϵ\epsilon-close to having arboricity α\alpha and graphs that cϵc \cdot \epsilon-far from having arboricity 3α3\alpha, where cc is an absolute small constant. The query complexity and running time of the algorithm are O~(nmlog(1/ϵ)ϵ+nαm(1ϵ)O(log(1/ϵ)))\tilde{O}\left(\frac{n}{\sqrt{m}}\cdot \frac{\log(1/\epsilon)}{\epsilon} + \frac{n\cdot \alpha}{m} \cdot \left(\frac{1}{\epsilon}\right)^{O(\log(1/\epsilon))}\right) where nn denotes the number of vertices and mm denotes the number of edges. In terms of the dependence on nn and mm this bound is optimal up to poly-logarithmic factors since Ω(n/m)\Omega(n/\sqrt{m}) queries are necessary (and α=O(m))\alpha = O(\sqrt{m})). We leave it as an open question whether the dependence on 1/ϵ1/\epsilon can be improved from quasi-polynomial to polynomial. Our techniques include an efficient local simulation for approximating the outcome of a global (almost) forest-decomposition algorithm as well as a tailored procedure of edge sampling

    On Sampling Edges Almost Uniformly

    Get PDF
    We consider the problem of sampling an edge almost uniformly from an unknown graph, G = (V, E). Access to the graph is provided via queries of the following types: (1) uniform vertex queries, (2) degree queries, and (3) neighbor queries. We describe a new simple algorithm that returns a random edge e in E using tilde{O}(n/sqrt{eps m}) queries in expectation, such that each edge e is sampled with probability (1 +/- eps)/m. Here, n = |V| is the number of vertices, and m = |E| is the number of edges. Our algorithm is optimal in the sense that any algorithm that samples an edge from an almost-uniform distribution must perform Omega(n/sqrt{m}) queries

    Lower Bounds for Approximating Graph Parameters via Communication Complexity

    Get PDF
    In a celebrated work, Blais, Brody, and Matulef [Blais et al., 2012] developed a technique for proving property testing lower bounds via reductions from communication complexity. Their work focused on testing properties of functions, and yielded new lower bounds as well as simplified analyses of known lower bounds. Here, we take a further step in generalizing the methodology of [Blais et al., 2012] to analyze the query complexity of graph parameter estimation problems. In particular, our technique decouples the lower bound arguments from the representation of the graph, allowing it to work with any query type. We illustrate our technique by providing new simpler proofs of previously known tight lower bounds for the query complexity of several graph problems: estimating the number of edges in a graph, sampling edges from an almost-uniform distribution, estimating the number of triangles (and more generally, r-cliques) in a graph, and estimating the moments of the degree distribution of a graph. We also prove new lower bounds for estimating the edge connectivity of a graph and estimating the number of instances of any fixed subgraph in a graph. We show that the lower bounds for estimating the number of triangles and edge connectivity also hold in a strictly stronger computational model that allows access to uniformly random edge samples

    Approximately Counting Triangles in Sublinear Time

    Full text link
    We consider the problem of estimating the number of triangles in a graph. This problem has been extensively studied in both theory and practice, but all existing algorithms read the entire graph. In this work we design a {\em sublinear-time\/} algorithm for approximating the number of triangles in a graph, where the algorithm is given query access to the graph. The allowed queries are degree queries, vertex-pair queries and neighbor queries. We show that for any given approximation parameter 0<ϵ<10<\epsilon<1, the algorithm provides an estimate t^\widehat{t} such that with high constant probability, (1ϵ)t<t^<(1+ϵ)t(1-\epsilon)\cdot t< \widehat{t}<(1+\epsilon)\cdot t, where tt is the number of triangles in the graph GG. The expected query complexity of the algorithm is  ⁣(nt1/3+min{m,m3/2t})poly(logn,1/ϵ)\!\left(\frac{n}{t^{1/3}} + \min\left\{m, \frac{m^{3/2}}{t}\right\}\right)\cdot {\rm poly}(\log n, 1/\epsilon), where nn is the number of vertices in the graph and mm is the number of edges, and the expected running time is  ⁣(nt1/3+m3/2t)poly(logn,1/ϵ)\!\left(\frac{n}{t^{1/3}} + \frac{m^{3/2}}{t}\right)\cdot {\rm poly}(\log n, 1/\epsilon). We also prove that Ω ⁣(nt1/3+min{m,m3/2t})\Omega\!\left(\frac{n}{t^{1/3}} + \min\left\{m, \frac{m^{3/2}}{t}\right\}\right) queries are necessary, thus establishing that the query complexity of this algorithm is optimal up to polylogarithmic factors in nn (and the dependence on 1/ϵ1/\epsilon).Comment: To appear in the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2015

    Provable and practical approximations for the degree distribution using sublinear graph samples

    Full text link
    The degree distribution is one of the most fundamental properties used in the analysis of massive graphs. There is a large literature on graph sampling, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. The degree distribution estimation poses a significant challenge, due to its heavy-tailed nature and the large variance in degrees. We design a new algorithm, SADDLES, for this problem, using recent mathematical techniques from the field of sublinear algorithms. The SADDLES algorithm gives provably accurate outputs for all values of the degree distribution. For the analysis, we define two fatness measures of the degree distribution, called the hh-index and the zz-index. We prove that SADDLES is sublinear in the graph size when these indices are large. A corollary of this result is a provably sublinear algorithm for any degree distribution bounded below by a power law. We deploy our new algorithm on a variety of real datasets and demonstrate its excellent empirical behavior. In all instances, we get extremely accurate approximations for all values in the degree distribution by observing at most 1%1\% of the vertices. This is a major improvement over the state-of-the-art sampling algorithms, which typically sample more than 10%10\% of the vertices to give comparable results. We also observe that the hh and zz-indices of real graphs are large, validating our theoretical analysis.Comment: Longer version of the WWW 2018 submissio

    Sublinear-Time Distributed Algorithms for Detecting Small Cliques and Even Cycles

    Get PDF
    In this paper we give sublinear-time distributed algorithms in the CONGEST model for subgraph detection for two classes of graphs: cliques and even-length cycles. We show for the first time that all copies of 4-cliques and 5-cliques in the network graph can be listed in sublinear time, O(n^{5/6+o(1)}) rounds and O(n^{21/22+o(1)}) rounds, respectively. Prior to our work, it was not known whether it was possible to even check if the network contains a 4-clique or a 5-clique in sublinear time. For even-length cycles, C_{2k}, we give an improved sublinear-time algorithm, which exploits a new connection to extremal combinatorics. For example, for 6-cycles we improve the running time from O~(n^{5/6}) to O~(n^{3/4}) rounds. We also show two obstacles on proving lower bounds for C_{2k}-freeness: First, we use the new connection to extremal combinatorics to show that the current lower bound of Omega~(sqrt{n}) rounds for 6-cycle freeness cannot be improved using partition-based reductions from 2-party communication complexity, the technique by which all known lower bounds on subgraph detection have been proven to date. Second, we show that there is some fixed constant delta in (0,1/2) such that for any k, a Omega(n^{1/2+delta}) lower bound on C_{2k}-freeness implies new lower bounds in circuit complexity. For general subgraphs, it was shown in [Orr Fischer et al., 2018] that for any fixed k, there exists a subgraph H of size k such that H-freeness requires Omega~(n^{2-Theta(1/k)}) rounds. It was left as an open problem whether this is tight, or whether some constant-sized subgraph requires truly quadratic time to detect. We show that in fact, for any subgraph H of constant size k, the H-freeness problem can be solved in O(n^{2 - Theta(1/k)}) rounds, nearly matching the lower bound of [Orr Fischer et al., 2018]

    Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection

    Get PDF
    We revisit the classic problem of estimating the degree distribution moments of an undirected graph. Consider an undirected graph G=(V,E) with n (non-isolated) vertices, and define (for s > 0) mu_s = 1n * sum_{v in V} d^s_v. Our aim is to estimate mu_s within a multiplicative error of (1+epsilon) (for a given approximation parameter epsilon>0) in sublinear time. We consider the sparse graph model that allows access to: uniform random vertices, queries for the degree of any vertex, and queries for a neighbor of any vertex. For the case of s=1 (the average degree), widetilde{O}(sqrt{n}) queries suffice for any constant epsilon (Feige, SICOMP 06 and Goldreich-Ron, RSA 08). Gonen-Ron-Shavitt (SIDMA 11) extended this result to all integral s > 0, by designing an algorithms that performs widetilde{O}(n^{1-1/(s+1)}) queries. (Strictly speaking, their algorithm approximates the number of star-subgraphs of a given size, but a slight modification gives an algorithm for moments.) We design a new, significantly simpler algorithm for this problem. In the worst-case, it exactly matches the bounds of Gonen-Ron-Shavitt, and has a much simpler proof. More importantly, the running time of this algorithm is connected to the degeneracy of G. This is (essentially) the maximum density of an induced subgraph. For the family of graphs with degeneracy at most alpha, it has a query complexity of widetilde{O}left(frac{n^{1-1/s}}{mu^{1/s}_s} Big(alpha^{1/s} + min{alpha,mu^{1/s}_s}Big)right) = widetilde{O}(n^{1-1/s}alpha/mu^{1/s}_s). Thus, for the class of bounded degeneracy graphs (which includes all minor closed families and preferential attachment graphs), we can estimate the average degree in widetilde{O}(1) queries, and can estimate the variance of the degree distribution in widetilde{O}(sqrt{n}) queries. This is a major improvement over the previous worst-case bounds. Our key insight is in designing an estimator for mu_s that has low variance when G does not have large dense subgraphs

    Sampling Multiple Edges Efficiently

    Get PDF
    We present a sublinear time algorithm that allows one to sample multiple edges from a distribution that is pointwise ?-close to the uniform distribution, in an amortized-efficient fashion. We consider the adjacency list query model, where access to a graph G is given via degree and neighbor queries. The problem of sampling a single edge in this model has been raised by Eden and Rosenbaum (SOSA 18). Let n and m denote the number of vertices and edges of G, respectively. Eden and Rosenbaum provided upper and lower bounds of ?^*(n/? m) for sampling a single edge in general graphs (where O^*(?) suppresses poly(1/?) and poly(log n) dependencies). We ask whether the query complexity lower bound for sampling a single edge can be circumvented when multiple samples are required. That is, can we get an improved amortized per-sample cost if we allow a preprocessing phase? We answer in the affirmative. We present an algorithm that, if one knows the number of required samples q in advance, has an overall cost that is sublinear in q, namely, O^*(? q ?(n/? m)), which is strictly preferable to O^*(q? (n/? m)) cost resulting from q invocations of the algorithm by Eden and Rosenbaum. Subsequent to a preliminary version of this work, T?tek and Thorup (arXiv, preprint) proved that this bound is essentially optimal

    Embeddings and Labeling Schemes for A*

    Get PDF
    A* is a classic and popular method for graphs search and path finding. It assumes the existence of a heuristic function h(u,t) that estimates the shortest distance from any input node u to the destination t. Traditionally, heuristics have been handcrafted by domain experts. However, over the last few years, there has been a growing interest in learning heuristic functions. Such learned heuristics estimate the distance between given nodes based on "features" of those nodes. In this paper we formalize and initiate the study of such feature-based heuristics. In particular, we consider heuristics induced by norm embeddings and distance labeling schemes, and provide lower bounds for the tradeoffs between the number of dimensions or bits used to represent each graph node, and the running time of the A* algorithm. We also show that, under natural assumptions, our lower bounds are almost optimal
    corecore